Spatial Data

Dr. Colin Rundel

Spatial data in R

Packages for geospatial data in R

R has a rich package ecosystem for read/writing, manipulating, and analyzing geospatial data.

Some core packages:

  • sp - core classes for handling spatial data, additional utility functions - Deprecated

  • rgdal - R interface to gdal (Geospatial Data Abstraction Library) for reading and writing spatial data - Deprecated

  • rgeos - R interface to geos (Geometry Engine Open Source) library for querying and manipulating spatial data. Reading and writing WKT. - Deprecated

  • sf - Combines the functionality of sp, rgdal, and rgeos into a single package based on tidy simple features.

  • raster - classes and tools for handling spatial raster data.

  • stars - Reading, manipulating, writing and plotting spatiotemporal arrays (rasters)

See more - Spatial task view

Installing sf

This is the hardest part of using the sf package, difficulty comes from is dependence on several external libraries (geos, gdal, and proj).

  • Windows - installing from source works when Rtools is installed (system requirements are downloaded from rwinlib)

  • MacOS - install dependencies via homebrew: gdal2, geos, proj.

  • Linux - Install development packages for GDAL (>= 2.0.0), GEOS (>= 3.3.0) and Proj.4 (>= 4.8.0) from your package manager of choice.

More specific details are included in the README on github.

Simple Features

Reading, writing, and converting

  • sf
    • st_read() / st_write() - Shapefile, GeoJSON, KML, …
    • read_sf() / write_sf() - Same, supports tibbles …
    • st_as_sfc() / st_as_wkt() - sf <-> WKT
    • st_as_sfc() / st_as_binary() - sf <-> WKB
    • st_as_sfc() / as(x, "Spatial") - sf <-> sp

Example data

nc  = read_sf("data/nc_counties/", quiet=TRUE)
air = read_sf("data/airports/", quiet=TRUE)
hwy = read_sf("data/us_interstates/", quiet=TRUE)
nc
Simple feature collection with 100 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32186 ymin: 33.84175 xmax: -75.46003 ymax: 36.58815
Geodetic CRS:  NAD83
# A tibble: 100 × 9
     AREA PERIM…¹ COUNT…² STATE COUNTY FIPS  STATE…³ SQUAR…⁴
    <dbl>   <dbl>   <dbl> <chr> <chr>  <chr> <chr>     <dbl>
 1 0.112     1.61    1994 NC    Ashe … 37009 37         429.
 2 0.0616    1.35    1996 NC    Alleg… 37005 37         236.
 3 0.140     1.77    1998 NC    Surry… 37171 37         539.
 4 0.0891    1.43    1999 NC    Gates… 37073 37         342.
 5 0.0687    4.43    2000 NC    Curri… 37053 37         264.
 6 0.119     1.40    2001 NC    Stoke… 37169 37         456.
 7 0.0626    2.11    2002 NC    Camde… 37029 37         241.
 8 0.115     1.46    2003 NC    Warre… 37185 37         444.
 9 0.143     2.40    2004 NC    North… 37131 37         551.
10 0.0925    1.81    2005 NC    Hertf… 37091 37         356.
# … with 90 more rows, 1 more variable:
#   geometry <MULTIPOLYGON [°]>, and abbreviated variable
#   names ¹​PERIMETER, ²​COUNTYP010, ³​STATE_FIPS, ⁴​SQUARE_MIL

air
Simple feature collection with 940 features and 16 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -176.646 ymin: 17.70156 xmax: -64.80172 ymax: 71.28545
Geodetic CRS:  NAD83
# A tibble: 940 × 17
   AIRPRTX…¹ FEATURE ICAO  IATA  AIRPT…² CITY  STATE STATE…³
       <dbl> <chr>   <chr> <chr> <chr>   <chr> <chr> <chr>  
 1         0 AIRPORT KGON  GON   GROTON… GROT… CT    09     
 2         3 AIRPORT K6S5  6S5   RAVALL… HAMI… MT    30     
 3         4 AIRPORT KMHV  MHV   MOJAVE… MOJA… CA    06     
 4         6 AIRPORT KSEE  SEE   GILLES… SAN … CA    06     
 5         7 AIRPORT KFPR  FPR   ST LUC… FORT… FL    12     
 6         8 AIRPORT KRYY  RYY   COBB C… ATLA… GA    13     
 7        10 AIRPORT KMKL  MKL   MC KEL… JACK… TN    47     
 8        11 AIRPORT KCCR  CCR   BUCHAN… CONC… CA    06     
 9        13 AIRPORT KJYO  JYO   LEESBU… LEES… VA    51     
10        15 AIRPORT KCAD  CAD   WEXFOR… CADI… MI    26     
# … with 930 more rows, 9 more variables: COUNTY <chr>,
#   FIPS <chr>, TOT_ENP <dbl>, LATITUDE <dbl>,
#   LONGITUDE <dbl>, ELEV <dbl>, ACT_DATE <chr>,
#   CNTL_TWR <chr>, geometry <POINT [°]>, and abbreviated
#   variable names ¹​AIRPRTX010, ²​AIRPT_NAME, ³​STATE_FIPS

hwy
Simple feature collection with 233 features and 3 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -7472582 ymin: 2911107 xmax: 2443707 ymax: 8208428
Projected CRS: NAD83 / UTM zone 15N
# A tibble: 233 × 4
   ROUTE_NUM DIST_MILES DIST_KM                     geometry
   <chr>          <dbl>   <dbl>        <MULTILINESTRING [m]>
 1 I10          2449.   3941.   ((-1881200 4072307, -187992…
 2 I105           20.8    33.4  ((-1910156 5339585, -191013…
 3 I110           41.4    66.6  ((1054139 3388879, 1054287 …
 4 I115            1.58    2.55 ((-1013796 5284243, -101313…
 5 I12            85.3   137.   ((680741.7 3366581, 682709.…
 6 I124            1.73    2.79 ((1201467 3906285, 1201643 …
 7 I126            3.56    5.72 ((1601502 3829718, 1602136 …
 8 I129            3.1     4.99 ((217446 4705389, 217835.1 …
 9 I135           96.3   155.   ((96922.97 4313125, 96561.8…
10 I15          1436.   2311    ((-882875.7 5602902, -88299…
# … with 223 more rows

sf structure

str(nc)
sf [100 × 9] (S3: sf/tbl_df/tbl/data.frame)
 $ AREA      : num [1:100] 0.1118 0.0616 0.1402 0.0891 0.0687 ...
 $ PERIMETER : num [1:100] 1.61 1.35 1.77 1.43 4.43 ...
 $ COUNTYP010: num [1:100] 1994 1996 1998 1999 2000 ...
 $ STATE     : chr [1:100] "NC" "NC" "NC" "NC" ...
 $ COUNTY    : chr [1:100] "Ashe County" "Alleghany County" "Surry County" "Gates County" ...
 $ FIPS      : chr [1:100] "37009" "37005" "37171" "37073" ...
 $ STATE_FIPS: chr [1:100] "37" "37" "37" "37" ...
 $ SQUARE_MIL: num [1:100] 429 236 539 342 264 ...
 $ geometry  :sfc_MULTIPOLYGON of length 100; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:1030, 1:2] -81.7 -81.7 -81.7 -81.6 -81.6 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:8] "AREA" "PERIMETER" "COUNTYP010" "STATE" ...

sf classes

class(nc)
[1] "sf"         "tbl_df"     "tbl"        "data.frame"
class(nc$geometry)
[1] "sfc_MULTIPOLYGON" "sfc"             
class(st_geometry(nc))
[1] "sfc_MULTIPOLYGON" "sfc"             
class(nc$geometry[[1]])
[1] "XY"           "MULTIPOLYGON" "sfg"         

Projections / CRS

st_crs(nc)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

st_crs(hwy)
Coordinate Reference System:
  User input: NAD83 / UTM zone 15N 
  wkt:
PROJCRS["NAD83 / UTM zone 15N",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["UTM zone 15N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-93,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    ID["EPSG",26915]]

Projections

Plotting

Base Plots

plot(nc)

Geometry Plot

plot(st_geometry(nc), axes=TRUE)

Graticules

plot(nc[,"AREA"], axes=TRUE)

Graticules

plot(nc[,"AREA"], graticule=st_crs(nc), axes=TRUE)

Graticules (EPSG:3631)

plot(st_transform(nc[,"AREA"], 3631), axes=TRUE)

Graticules (EPSG:3631)

plot(st_transform(nc[,"AREA"], 3631), graticule=st_crs(nc), axes=TRUE)

Graticules (EPSG:3631)

plot(st_transform(nc[,"AREA"], 3631), graticule=st_crs(3631), axes=TRUE)

ggplot2

ggplot(nc) + 
  geom_sf(aes(fill=AREA))

ggplot2 + projections

ggplot(st_transform(nc, 3631)) + 
  geom_sf(aes(fill=AREA))

ggplot2 + viridis

ggplot(st_transform(nc, 3631)) + 
  geom_sf(aes(fill=AREA)) +
  scale_fill_viridis_c()

ggplot2 + calculations

ggplot(st_transform(nc, 3631)) + 
  geom_sf(aes(fill=AREA/PERIMETER^2)) +
  scale_fill_viridis_c()

Other color palettes (discrete)

Picking palette breaks

Picking palette breaks

Layering maps

Data

par(mar=c(3,3,3,0.1), mfrow=c(1,3))
plot(st_geometry(nc),  axes=TRUE, main="nc")
plot(st_geometry(air), axes=TRUE, pch=16, col="blue", main="air")
plot(st_geometry(hwy), axes=TRUE, col="red", main="hwy")

Base R layers

plot(st_geometry(nc),  axes=TRUE)
plot(st_geometry(air), axes=TRUE, pch=16, col="blue", add=TRUE)
plot(st_geometry(hwy), axes=TRUE, col="red", add=TRUE)

Base R layers - reproject

hwy_ll = st_transform(hwy, st_crs(nc))

plot(st_geometry(nc),  axes=TRUE)
plot(st_geometry(air), axes=TRUE, pch=16, col="blue", add=TRUE)
plot(st_geometry(hwy_ll), axes=TRUE, col="red", add=TRUE)

ggplot layers

ggplot() +
  geom_sf(data=nc) +
  geom_sf(data=air, color="blue") +
  geom_sf(data=hwy, color="red")

ggplot layers - extent

bb = st_bbox(nc)
ggplot() +
  geom_sf(data=nc) +
  geom_sf(data=air, color="blue") +
  geom_sf(data=hwy, color="red") +
  lims(x=bb[c(1,3)], y=bb[c(2,4)])

Plotting MCMC results

Scottish Lip Cancer Data

Neighborhood / weight matrix

A = (st_distance(lip_cancer) |> unclass()) < 1e-6
listw = spdep::mat2listw(A)

A hierachical model for lip cancer

We have observed counts of lip cancer for 56 districts in Scotland. Let \(y_i\) represent the number of lip cancer for district \(i\).

\[\begin{aligned} y_i &\sim \text{Poisson}(\lambda_i) \\ \\ \log(\lambda_i) &= \log(E_i) + x_i \beta + \omega_i \\ \\ \boldsymbol{\omega} &\sim N(\boldsymbol{0},~\sigma^2(\boldsymbol{D}-\phi\,\boldsymbol{A})^{-1}) \end{aligned}\]

where \(E_i\) is the expected counts for each region (and serves as an offset), \(\boldsymbol{D}\) is a diagonal matrix of neighbor counts, and \(\boldsymbold{A}\) is the adjacency matrix.

Data prep & CAR model

X = model.matrix(~scale(lip_cancer$pcaff))
offset = lip_cancer$Expected
y = lip_cancer$Observed
car_model = "
data {
  int<lower=0> N;
  int<lower=0> p;
  int<lower=0> y[N];
  matrix[N,N] A;
  matrix[N,p] X;
  vector[N] offset;
}
transformed data {
  vector[N] nb = A * rep_vector(1, N);
  matrix[N,N] D = diag_matrix(nb);
}
parameters {
  vector[N] w_s;
  vector[p] beta;
  real<lower=0> sigma2;
  real<lower=0,upper=1> phi;
}
transformed parameters {
  vector[N] eta = log(offset) + X * beta + w_s; 
}
model {
  matrix[N,N] Sigma_inv = (D - phi * A) / sigma2;
  w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv);

  beta ~ normal(0,10);
  sigma2 ~ cauchy(0,5);
  
  y ~ poisson_log(eta);
}
"

CAR Fitting

car = rstan::stan_model(model_code = car_model)

car_m = rstan::sampling(
  car, iter=5000, cores=4,
  data = list(N=nrow(X), A=A, X=X, p=ncol(X), offset=offset, y=y)
)

\(\hat\lambda\) posterior draws

Prediction summaries

# A tibble: 56 × 6
   .variable     i  mean   q25   q75    sd
   <chr>     <int> <dbl> <dbl> <dbl> <dbl>
 1 eta           1 1.63  1.40  1.88  0.363
 2 eta           2 0.913 0.649 1.19  0.400
 3 eta           3 2.30  2.14  2.47  0.247
 4 eta           4 3.62  3.52  3.73  0.157
 5 eta           5 0.721 0.509 0.947 0.333
 6 eta           6 2.21  2.06  2.38  0.247
 7 eta           7 2.64  2.49  2.80  0.233
 8 eta           8 2.09  1.89  2.31  0.311
 9 eta           9 2.72  2.57  2.89  0.233
10 eta          10 2.91  2.77  3.06  0.213
# … with 46 more rows

Joining draws with sf

(car_lip_cancer = lip_cancer %>% 
  mutate(i = seq_len(n())) %>%
  left_join(
    car_pred %>% filter(.variable == "eta"),
    by = "i"
  ) %>%
  mutate(y_pred = exp(mean)))
Simple feature collection with 56 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7150.759 ymin: 529557.2 xmax: 468393.4 ymax: 1218479
CRS:           +init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894
First 10 features:
       District id Observed Expected pcaff Latitude
1    Sutherland 12        5      1.8    16    58.06
2         Nairn 13        3      1.1    10    57.47
3     Inverness 19        9      5.5     7    57.24
4  Banff-Buchan  2       39      8.7    16    57.56
5      Bedenoch 17        2      1.1    10    57.06
6    Kincardine 16        9      4.6    16    57.00
7         Angus 21       16     10.5     7    56.75
8        Dundee 50        6     19.6     1    56.45
9       NE Fife 15       17      7.8     7    56.30
10    Kirkcaldy 25       19     15.5     1    56.20
   Longitude  i .variable      mean       q25       q75
1       4.64  1       eta 1.6311044 1.3963210 1.8827401
2       3.98  2       eta 0.9131655 0.6487173 1.1882329
3       4.73  3       eta 2.2966728 2.1352521 2.4702543
4       2.36  4       eta 3.6213156 3.5167991 3.7266633
5       4.09  5       eta 0.7206699 0.5091857 0.9473425
6       3.00  6       eta 2.2148004 2.0552073 2.3847897
7       2.98  7       eta 2.6439220 2.4885623 2.8049091
8       3.20  8       eta 2.0947916 1.8945332 2.3092128
9       3.10  9       eta 2.7245066 2.5715914 2.8863322
10      3.30 10       eta 2.9099389 2.7678132 3.0555052
          sd                       geometry    y_pred
1  0.3629014 MULTIPOLYGON (((254301.9 96...  5.109515
2  0.3996257 MULTIPOLYGON (((292629.6 85...  2.492199
3  0.2468279 MULTIPOLYGON (((214097.9 84...  9.941052
4  0.1568925 MULTIPOLYGON (((351078.2 86... 37.386722
5  0.3326180 MULTIPOLYGON (((239274.9 77...  2.055810
6  0.2472608 MULTIPOLYGON (((395569.4 80...  9.159580
7  0.2333714 MULTIPOLYGON (((379822.4 76... 14.068271
8  0.3112501 MULTIPOLYGON (((357538.2 73...  8.123748
9  0.2334651 MULTIPOLYGON (((324911.9 71... 15.248888
10 0.2130545 MULTIPOLYGON (((343283.3 70... 18.355678

Results

( ggplot(car_lip_cancer) +
    geom_sf(aes(fill=Observed), color=NA) + 
    labs(title="Observed Cases",fill="")) +
( ggplot(car_lip_cancer) +
    geom_sf(aes(fill=y_pred), color=NA) + 
    labs(title="Predicted Cases",fill=""))

Posterior uncertainty